[SPARK-23523] [SQL] Fix the incorrect result caused by the rule OptimizeMetadataOnlyQuery#20684
[SPARK-23523] [SQL] Fix the incorrect result caused by the rule OptimizeMetadataOnlyQuery#20684gatorsmile wants to merge 3 commits intoapache:masterfrom
Conversation
| relation.output.filter(a => partColumns.contains(a.name.toLowerCase)) | ||
| val attrMap = relation.output.map(_.name).zip(relation.output).toMap | ||
| partitionColumnNames.map { colName => | ||
| attrMap.getOrElse(colName, |
There was a problem hiding this comment.
Do we need to consider the case sensitivity when comparing the names? cc @cloud-fan
|
good catch! LGTM |
|
Test build #87707 has finished for PR 20684 at commit
|
|
Test build #87700 has finished for PR 20684 at commit
|
|
retest this please. |
|
Test build #87708 has finished for PR 20684 at commit
|
|
Test build #87709 has finished for PR 20684 at commit
|
|
retest this please |
|
Test build #87716 has finished for PR 20684 at commit
|
|
Hi, @gatorsmile and @cloud-fan . |
|
We are still waiting for the official announcement of Spark 2.3 release. This will be merged to 2.3.1 for sure. |
|
I see. Thank you for confirmation, @gatorsmile ! |
|
Gentle ping, @gatorsmile since 2.3 is announced officially yesterday. |
…zeMetadataOnlyQuery
## What changes were proposed in this pull request?
```Scala
val tablePath = new File(s"${path.getCanonicalPath}/cOl3=c/cOl1=a/cOl5=e")
Seq(("a", "b", "c", "d", "e")).toDF("cOl1", "cOl2", "cOl3", "cOl4", "cOl5")
.write.json(tablePath.getCanonicalPath)
val df = spark.read.json(path.getCanonicalPath).select("CoL1", "CoL5", "CoL3").distinct()
df.show()
```
It generates a wrong result.
```
[c,e,a]
```
We have a bug in the rule `OptimizeMetadataOnlyQuery `. We should respect the attribute order in the original leaf node. This PR is to fix it.
## How was this patch tested?
Added a test case
Author: gatorsmile <gatorsmile@gmail.com>
Closes apache#20684 from gatorsmile/optimizeMetadataOnly.
…he rule OptimizeMetadataOnlyQuery This PR is to backport #20684 and #20693 to Spark 2.3 branch --- ## What changes were proposed in this pull request? ```Scala val tablePath = new File(s"${path.getCanonicalPath}/cOl3=c/cOl1=a/cOl5=e") Seq(("a", "b", "c", "d", "e")).toDF("cOl1", "cOl2", "cOl3", "cOl4", "cOl5") .write.json(tablePath.getCanonicalPath) val df = spark.read.json(path.getCanonicalPath).select("CoL1", "CoL5", "CoL3").distinct() df.show() ``` It generates a wrong result. ``` [c,e,a] ``` We have a bug in the rule `OptimizeMetadataOnlyQuery `. We should respect the attribute order in the original leaf node. This PR is to fix it. ## How was this patch tested? Added a test case Author: Xingbo Jiang <xingbo.jiang@databricks.com> Author: gatorsmile <gatorsmile@gmail.com> Closes #20763 from gatorsmile/backport23523.
…zeMetadataOnlyQuery
## What changes were proposed in this pull request?
```Scala
val tablePath = new File(s"${path.getCanonicalPath}/cOl3=c/cOl1=a/cOl5=e")
Seq(("a", "b", "c", "d", "e")).toDF("cOl1", "cOl2", "cOl3", "cOl4", "cOl5")
.write.json(tablePath.getCanonicalPath)
val df = spark.read.json(path.getCanonicalPath).select("CoL1", "CoL5", "CoL3").distinct()
df.show()
```
It generates a wrong result.
```
[c,e,a]
```
We have a bug in the rule `OptimizeMetadataOnlyQuery `. We should respect the attribute order in the original leaf node. This PR is to fix it.
## How was this patch tested?
Added a test case
Author: gatorsmile <gatorsmile@gmail.com>
Closes apache#20684 from gatorsmile/optimizeMetadataOnly.
What changes were proposed in this pull request?
It generates a wrong result.
We have a bug in the rule
OptimizeMetadataOnlyQuery. We should respect the attribute order in the original leaf node. This PR is to fix it.How was this patch tested?
Added a test case